A Finite State Transducer (FST) based Font Converter
نویسندگان
چکیده
This paper describes the rule based approach towards the development of an Oriya Font Converter that effectively converts the SAMBAD and AKRUTI proprietary font to standardize Unicode font. This can be very much helpful towards electronic storage of information in the native language itself, proper search and retrieval. Our approach mainly involves the Apertium machine translation tool that uses Finite State Transducers for conversion of symbolic data to standardized Unicode Oriya font. To do so it requires a map table mapping the commonly used Oriya syllables in Proprietary font to its corresponding font code and the dictionary specifying the rules for mapping the proprietary font code to Unicode font. Further some unhandled symbols that appear in the intermediate converted file are rectified by Flex scanner tool. The converted text thus obtained is in standard Unicode font and remains unchanged as Unicode font is supported by almost all the platforms.
منابع مشابه
A Flexible, Scalable Finite-state Transducer Architecture for Corpus-based Concatenative Speech Synthesis1
In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synt...
متن کاملA flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis
In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synt...
متن کاملN-gram FST Indexing for Spoken Term Detection
An efficient indexing scheme is essentially important for spoken term detection (STD) on large databases, particularly for phone-based systems that have been widely adopted to achieve vocabulary-independent detection. While the finite state transducer (FST) composition provides a standard indexing approach, the n-gram reverse indexing is more flexible in connectivity representation and confiden...
متن کاملInjection Structures Specified by Finite State Transducers
An injection structure A = (A, f) is a set A together with a one-place one-to-one function f . A is an FST injection structure if A is a regular set, that is, the set of words accepted by some finite automaton, and f is realized by a finite-state transducer. We initiate the study of FST injection structures. We show that the model checking problem for FST injection structures is undecidable whi...
متن کاملAutomated Essay Scoring Based on Finite State Transducer: towards ASR Transcription of Oral English Speech
Conventional Automated Essay Scoring (AES) measures may cause severe problems when directly applied in scoring Automatic Speech Recognition (ASR) transcription as they are error sensitive and unsuitable for the characteristic of ASR transcription. Therefore, we introduce a framework of Finite State Transducer (FST) to avoid the shortcomings. Compared with the Latent Semantic Analysis with Suppo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012